[% setvar title Extend regex syntax to provide for return of a hash of matched subpatterns %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='TITLE'></a><h1>TITLE</h1>
<p>Extend regex syntax to provide for return of a hash of matched subpatterns</p>
<a name='VERSION'></a><h1>VERSION</h1>
<pre>  Maintainer: Kevin Walker &lt;<a href='mailto:kwalker@xmission.com'>kwalker@xmission.com</a>&gt;
  Date: 23 Aug 2000
  Last Modified: 30 Sep 2000
  Mailing List: <a href='mailto:perl6-language-regex@perl.org'>perl6-language-regex@perl.org</a>
  Number: 150
  Version: 2
  Status: Frozen</pre>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>Currently regexes return matched subpatterns as a list.  This is
inconvenient in at least two situations: (1) long, complicated regexes,
where counting parentheses can be difficult and error-prone; and, more
importantly, (2) matching against a list of regexes, when the corresponding
fields of the various regexes do not occur in the same order.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>I suggest that (?% field_name : pattern) spit out 'field_name', in addition
to the matched pattern, when matching in a list context:</p>
<pre>     $text = &quot;abajace -- mailbox full&quot;;
	%hash = $text =~ /^ (?% username : \S+) \s*--\s* (?% reason : .*)$/xsi;</pre>
<p>would result in %hash = (username =&gt; 'abajace', reason =&gt; 'mailbox full').</p>
<p>Suggestions for better syntax are hereby solicited.  (?% field_name -&gt;
pattern) and (?% field_name =&gt; pattern) come immediately to mind.</p>
<p>Why This Would be Useful:</p>
<p>Often one wants to match a string against a list of patterns which extract
similar information from the string, but the fields occur in varying orders.
Also, some optional fields might get extracted by some patterns and not by
others.  Continuing with the (over-simplified) example of analyzing e-mail
bounce messages:</p>
<pre>   my @regexps = (

       # 'abajace -- mailbox full' or 'abajace -- user unknown'
       q/^ \s* (?% username  : \S+) \s*--\s* (?% reason : .*)$/,
 
       # 'Unknown local part: flycrake'
       q/^ \s* (?% reason : Unknown\ local\ part): \s* (?% username  : \S+)/,
 
       # 'New address for abajace is <a href='mailto:abajace63@yahoo.com'>abajace63@yahoo.com</a>'
       q/(?% reason : new\ address\ for) \s+ (?% username  : \S+) \s+ is \s+
                (?% new_address : \S+\@\S+)/,

   );

   while (my $bounce_text = get_next_message()) {
       my %field = ();
       for my $regexp (@regexps) {
           if ( %field = $bounce_text =~ /$regexp/xsi;) {
               print &quot;username: $field{username}, reason: $field{reason}\n&quot;;
               if ($field{new_address}) {
                   change_address($field{username}, $field{new_address});
               }
               last;
           }
       }
   }</pre>
<p>Backrefs</p>
<p>It would also be useful to have named backrefs.  I propose that (\%field_name)
match a previous a previous named bracket.  As before, I'm not attached to
the proposed syntax.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<p>I confess that I'm not an expert in regex internals.  Nevertheless, I'll go
out on a limb and assert that this will be relatively easy to implement,
with relatively few entangling side-issues.</p>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p>See also: RFC 112: Assignment within a regex</p>
</div>
